Back

PLOS Digital Health

88 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Development and automated deployment of a specialised machine learning schema within a collaborative research centre: an explorative approach using large language models
2025-10-07 health systems and quality improvement 10.1101/2025.10.06.25337418
#1 (31.2%)
Show abstract

Achieving interoperability in machine learning (ML) workflows remains a significant challenge due to the heterogeneity of data types, algorithms, and application domains, as well as the lack of standardized metadata. In this study, we present the development of a specialized ML metadata schema within the context of the Small Data Initiative, a Collaborative Research Center characterized by diverse scientific approaches. We employed an interdisciplinary process combining expert input, iterative r...

2
Multidimensional Evaluation of Large Language Models on the AAP In-Service Examination: Assessing Accuracy, Calibration, and Citation Reliability
2025-10-17 dentistry and oral medicine 10.1101/2025.10.14.25338040
#1 (26.6%)
Show abstract

BackgroundLarge language models (LLMs) have demonstrated rapid advancements in natural language understanding and generation, prompting their integration into biomedical research, clinical practice, and professional education. However, systematic evaluation of LLMs in specialty-specific domains such as dentistry and periodontology remain limited, particularly regarding multidimensional performance metrics. ObjectiveTo conduct a comprehensive, multidimensional assessment of commercially availabl...

3
ENTAgents: AI Agents for Complex Knowledge Otolaryngology
2025-01-07 otolaryngology 10.1101/2025.01.01.25319863
#1 (24.6%)
Show abstract

Various healthcare applications based on large language models (LLMs) have emerged as LLMs show improved efficiency and error reduction. Recently, retrieval augmented generation (RAG) has been adopted frequently for LLM applications to solve the problem of hallucinations. Despite the success of RAG, it has its drawbacks, including incomplete semantic meanings, and large-scale dataset requirements. AI Agents have shown great potential in medicine and healthcare applications by leveraging their ri...

4
Machine Learning for Paediatric Related Decision Support in Emergency Care - A UK and Ireland Network Survey Study
2025-06-30 emergency medicine 10.1101/2025.06.29.25330501
#1 (24.5%)
Show abstract

This study explores clinician understanding and perception at site lead level towards machine learning (ML) decision support tools for paediatric related emergency care across the UK and Ireland, essential in guiding safe and effective frontline implementation. A cross-sectional online survey was distributed via Paediatric Emergency Research United Kingdom and Ireland (PERUKI) to the lead for digital systems or PERUKI site lead, with one response sought per site. Survey development was in REDCap...

5
How effective is generative AI advice for the academic advancement of faculty?
2025-05-26 medical education 10.1101/2025.05.25.25328317
#1 (24.4%)
Show abstract

The SingHealth Duke-NUS Academic Medical Center manages over 2,800 clinical faculty members and processes over 400 appointments and promotions annually. The current Promotion and Tenure documentation includes over 30 documents, making it difficult and time-consuming for the faculty to locate specific appointment information. We developed "AskADD" in response to requests for clearer academic career development guidance. This study reports initial alpha testing and subsequent beta testing with 35 ...

6
Developing a GraphRAG-enabled local-LLM for Gestational Diabetes Mellitus.
2025-04-30 endocrinology 10.1101/2025.04.28.25326568
#1 (23.6%)
Show abstract

This paper re-imagines a world of abundance in the treatment of chronic diseases such as Tpe 2 Diabetes. It asks: what if preventive and diagnostic remedies were widely made available across the world, informed by the latest medical research? As Proof-of-Concept of a proposed solution, the paper describes the development and validation of a local Large Language Models (local-LLMs) based on Graph-based Retrieval-Augmented Generation (GraphRAG) for managing Gestational Diabetes Mellitus (GDM). The...

7
Performance of Multimodal GPT-4V on USMLE with Image: Potential for Imaging Diagnostic Support with Explanations
2023-10-26 radiology and imaging 10.1101/2023.10.26.23297629
#1 (23.6%)
Show abstract

BackgroundUsing artificial intelligence (AI) to help clinical diagnoses has been an active research topic for more than six decades. Past research, however, has not had the scale and accuracy for use in clinical decision making. The power of AI in large language model (LLM)-related technologies may be changing this. In this study, we evaluated the performance and interpretability of Generative Pre-trained Transformer 4 Vision (GPT-4V), a multimodal LLM, on medical licensing examination questions...

8
Evaluating Few-Shot Prompting for Spectrogram-Based Lung Sound Classification Using a Multimodal Language Model
2025-07-28 respiratory medicine 10.1101/2025.07.27.25332255
#1 (23.6%)
Show abstract

IntroductionTraditional deep learning models for lung sound analysis require large, labeled datasets; multimodal LLMs may offer a flexible, prompt-based alternative. This study aimed to evaluate the utility of a general-purpose multimodal LLM, GPT-4o, for lung sound classification from mel-spectrograms and assess whether a few-shot prompt approach improves performance over zero-shot prompting. MethodsUsing the ICBHI 2017 Respiratory Sound Database, 6898 annotated respiratory cycles were convert...

9
An electronic application to improve management of infections in low-income neonatal units: pilot implementation of the NeoTree Beta App in a public sector hospital in Zimbabwe.
2020-09-27 health systems and quality improvement 10.1101/2020.09.25.20201467
#1 (23.4%)
Show abstract

There are 2.9 million annual neonatal deaths worldwide. Simple, evidence-based interventions such as temperature control could prevent approximately two-thirds of these deaths. However, key problems in implementing these interventions are a lack of newborn-trained healthcare workers and a lack of data collection systems. NeoTree is a digital platform aiming to improve newborn care in low-resource settings through real-time data capture and feedback alongside education and data linkage. This proj...

10
PREFER-IT: A transdisciplinary co-created framework to realise inclusive medical AI
2025-11-06 health informatics 10.1101/2025.11.03.25339443
#1 (23.2%)
Show abstract

Artificial intelligence (AI) in healthcare holds transformative potential but risks exacerbating existing health disparities if inclusivity is not explicitly accounted for. This study addresses the disconnected discussions on inclusive medical AI by developing a comprehensive framework, PREFER-IT. This framework is based on the outcomes of a five-day transdisciplinary co-creation workshop that involved 37 experts from diverse backgrounds, including healthcare, ethics, law, social sciences, AI, a...

11
Beyond Accuracy: A Cost-Aware Approach to Skin Lesion Detection Across Skin Tone Imbalances
2024-12-12 health systems and quality improvement 10.1101/2024.12.11.24318858
#1 (23.2%)
Show abstract

Skin lesion prediction using artificial intelligence (AI) models is highly dependent on skin tone, yet current approaches largely overlook this critical factor. The Fitzpatrick 17k dataset, which contains six skin tone categories: lighter to darker, is severely imbalanced, with most models biased toward lighter skin tones. Previous efforts to improve overall accuracy fall short: overall accuracy fails to reflect true performance across imbalances. This creates a significant gap, as effective ski...

12
Authors self-disclosed use of artificial intelligence in research submissions to 49 biomedical journals: A cross-sectional study
2025-10-25 health informatics 10.1101/2025.10.24.25338574
#1 (22.9%)
Show abstract

OBJECTIVETo analyze the frequency of self-disclosed use of AI in research manuscripts submitted to 49 biomedical journals and to identify types of AI tools used, the tasks they assisted with, and factors associated with disclosure. DESIGNCross-sectional study. SETTING49 biomedical journals published by BMJ Group. PARTICIPANTSSubmitting authors of 25,114 empirical research manuscripts including systematic reviews and meta-analyses, submitted between 8 April 2024 and 6 November 2024. MAIN OUTC...

13
Large Language Models (LLMs) and Empathy - A Systematic Review
2023-08-07 health informatics 10.1101/2023.08.07.23293769
#1 (22.8%)
Show abstract

PurposeEmpathy, a cornerstone of human interaction, is a unique quality to humans that Large Language Models (LLMs) are believed to lack. Our study aims to review the literature on the capacity of LLMs in demonstrating empathy MethodsWe conducted a literature search on MEDLINE up to July 2023. Seven publications ultimately met the inclusion criteria. ResultsAll studies included in this review were published in 2023. All studies but one focused on ChatGPT-3.5 by OpenAI. Only one study evaluated...

14
Robodoc: a conversational-AI based app for medical conversations
2024-01-02 health informatics 10.1101/2023.12.31.23300681
#1 (22.6%)
Show abstract

BackgroundArtificial Intelligence (AI) has evolved through various trends, with different subfields gaining prominence over time. Currently, Conversational Artificial Intelligence (CAI)--particularly Generative AI--is at the forefront. CAI models are primarily focused on text-based tasks and are commonly deployed as chatbots. Recent advancements by OpenAI have enabled the integration of external, independently developed models, allowing chatbots to perform specialized, task-oriented functions be...

15
Development of an AI-enabled predictive model to identify the 'sick child' at a pediatric telemedicine and medication delivery service in Haiti
2025-06-28 pediatrics 10.1101/2025.06.27.25330413
#1 (22.5%)
Show abstract

BackgroundOne of the most difficult challenges in pediatric telemedicine is to accurately discriminate between the sick and not sick child, especially in resource-limited settings. Models that flag potentially sick cases for additional safety checks represent an opportunity for telemedicine to reach its potential. However, there are critical knowledge gaps on how to develop such models and integrate them into electronic clinical decision support (eCDS) tools. MethodsTo address this challenge...

16
A conversational agent for providing personalized PrEP support: Protocol for chatbot implementation
2025-05-05 health informatics 10.1101/2025.05.02.25326894
#1 (22.3%)
Show abstract

BackgroundChatbots have the potential to reduce barriers to pre-exposure prophylaxis (PrEP), including lack of awareness, misconceptions, and stigma, by providing anonymous and continuous support. However, in the context of PrEP chatbots are still nascent; they lack personalized informational expertise, peer experiential expertise, and human-like emotional support to promote PrEP uptake and retention. Tailoring information, providing relatable peer experiences, and offering effective emotional s...

17
Microsoft Bing outperforms five other generative artificial intelligence chatbots in the Antwerp University multiple choice medical license exam
2023-08-21 health informatics 10.1101/2023.08.18.23294263
#1 (22.2%)
Show abstract

Recently developed chatbots based on large language models (further called bots) have promising features which could facilitate medical education. Several bots are freely available, but their proficiency has been insufficiently evaluated. In this study the authors have tested the current performance on the multiple-choice medical licensing exam of University of Antwerp (Belgium) of six widely used bots: ChatGPT (OpenAI), Bard (Google), New Bing (Microsoft), Claude instant (Anthropic), Claude+ (A...

18
Tracking the tension: Examining emotional conflict experienced in wearable activity tracker users.
2025-12-08 health informatics 10.64898/2025.12.03.25341327
#1 (22.0%)
Show abstract

Wearable activity trackers have been recognised as effective tools for physical activity promotion, leading to their integration in healthcare services. Although, some qualitative literature indicated that device users may experience emotional conflict. The current study is the first of our knowledge to directly examine the conflict faced by wearable activity tracker users. A qualitative, exploratory design was followed, with inductive thematic analysis conducted on semi-structured interview tr...

19
How Large Language Models Perform on the United States Medical Licensing Examination: A Systematic Review
2023-09-03 health informatics 10.1101/2023.09.03.23294842
#1 (21.7%)
Show abstract

ABSTRACTO_ST_ABSObjectiveC_ST_ABSThe United States Medical Licensing Examination (USMLE) assesses physicians competency and passing is a requirement to practice medicine in the U.S. With the emergence of large language models (LLMs) like ChatGPT and GPT-4, understanding their performance on these exams illuminates their potential in medical education and healthcare. Materials and MethodsA literature search following the 2020 PRISMA guidelines was conducted, focusing on studies using official US...

20
RAGCare-QA: A Benchmark Dataset for Evaluating Retrieval-Augmented Generation Pipelines in Theoretical Medical Knowledge
2025-08-16 health informatics 10.1101/2025.08.15.25333718
#1 (21.5%)
Show abstract

The paper introduces RAGCare-QA, an extensive dataset of 420 theoretical medical knowledge questions for assessing Retrieval-Augmented Generation (RAG) pipelines in medical education and evaluation settings. The dataset includes one-choice-only questions from six medical specialties (Cardiology, Endocrinology, Gastroenterology, Family Medicine, Oncology, and Neurology) with three levels of complexity (Basic, Intermediate, and Advanced). Each question is accompanied by the best fit of RAG impleme...